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1 Video BneodlHq 
2 

3 The invention relates to video encoders and in 

4 partiaulax to reducing the computational complexity 

5 when encoding video. 
6 

7 Video encoders and decoders {CODBCs) based on video 
encoding standards such as H263 and MPEG-4 are well 

B known in the art of video con^ression. 



8 



The development of these standards has led to the 
ability to send video over much smaller bandwidths 
with only a minor reduction in quality. However, 
decoding and, more specifically, encoding, requires 
a significant amount of computational processing 
resources. For mobile devices, such as persooxal 
digital assistants (PDA's) or mobile telephones, 
power usage is closely related to processor 
utilisation and therefore relates to the life of the- 
battery charge. It is obviously desirable to reduce • 
the amount of processing in mobile devices to ' 
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the amount of processing in niobile devices to 
increase the operaljle time of the device for each 
battery charge. In general -purpose personal 
computers, CODECS rmast share processing resources 
with other applications. This has contributed to the 
drive to reduce processing utilisation, and 
therefore power drain, without compromising viewing 
quality. 

In many video applications , such as tele- 
conferences/ the majority of the area captured by 
the camera is static. In these cases, power 
resources or processor resources are being used 
unnecessarily to encode areas which have not changed 
significantly from a reference video frame. 

The typical steps required to process the pictures 
in a video by an encoder such as one that is H263 or 
MPEG-4 Simple Profile compatible, are described as 
an example* 

The first step requires that reference pictures be 
selected for the current picture. These reference 
pictures are divided into non- overlapping 
raacroblocks. Each macroblocJc comprises four 
luminance blocks and two chrominance blocks , each 
block comprising 8 pixels by 8 pixels • 

It is well known that the steps in the encoding 
process that typically require the greatest 
computational time are the motion estimation^, the 



Resend1S-12-02; 1 1 : 10 jMurgitroyd and co. 

S 01413078401 # i 



1 
2 
3 
4 

5 

6 

7 

8 

9 



forward discrete cosine transform (FDCT) and the 
inverse discrete cosine transform (IDCT) . 



The motion estimation step looks for similarities 
between the current picture and one or more 
reference pictures. For each macroblock in the 
current picture, a search is carried out to identify 
a prediction macroblock in the reference picture 
which best matches the current macroblock in the 
10 current picture. The prediction macroblock is 

identified by a motion vector (Mv) which indicates a 
distance offset from the current macroblock. The 
prediction macroblock is then subtracted from the 
current macroblock to form a prediction error (pe) 
15 macroblock. This PE macroblock is then discrete 

cosine transformed, which transforms an image from 
the spatial domain to the frequency domain and 
outputs a matrix of coefficients relating to the 
spectral sub-bands. For most pictures much of the 
signal energy is at low frequencies, which is what 
the human eye is most sensitive to. The formed DCT 
matrix is then quantised which involves dividing the 
DCT coefficients by a quantizer value and then 
rounding to the nearest integer. This has the effect 
of reducing many of the higher frequency 
coefficients to zeros and xu the step that will 
cause distortion to the image . Typically, the higher 
the quantizer step size, the poorer the quality of 
the image. The values from the matrix after the 
quantizer step are then re-ordered by "zigzag" 
scanning. This involves reading the values from the . 
top left-hand comer of the matrix diagonally back 
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and forward dovm to the bottom right-hand comer of 
the matrix* . This tends to group the zeros together 
which allows the stream to be efficiently run-level 
encoded (RIjE) before eventually being converted into 
a bitstream by entropy encoding, other "header" data 
is usually added at this point. 

If the MV is equal to zero and the quantised DCT 
coefficients are all. equal to zero then there is no 
need to include encoded data for the macroblock in 
the encoded bitstream. Instead, header information 
is included to indicate that the macroblock has been 
"skipped". 

US 6,192^148 discloses a method for predicting 
whether a macroblock- should be skipped prior to the 
DCT steps of the encoding process. This method 
decides whether to complete- the steps after the 
motion estimation if the MV has been returned as 
zero,, the mean absolute difference of the luminance 
values of the macroblock is less than a first 
threshold and the mean absolute difference of the 
chrominance values of the macroblock is less than a 
second threshold. 

For the total encoding process the motion estimation 
and the PDCT and IDGT are typically the most 
processor intensive. The prior art only predicts 
skipped blocks after the step of motion estimation 
and therefore still contains a step in the process 
that can be considered processor intensive. 
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1 The present inventioii discloses a method to predict 

2 skipped maerobloaka that requires no motion 

3 estimation or DCT steps, 
4 



5 According to one aspect, the invention provides a 

6 method of encoding video pictures comprising the 

7 steps of: 

8 dividing the picture into regions; 
predicting whether each region requires 

processing through further steps by comparing each 
region with a reference region. Hence, the invention 
avoids unnecessary use of resources by avoiding 
processor intensive operations where possible. 



The further steps preferably include motion 
estimation and/or discrete cosine transform steps. 

A region is preferably a non-overlapping macroblock. 
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20 A macroblock is preferably a sixteen by sixteen 

21 matrix of pixels. 
22 
23 
24 
25 
26 
27 

28 Preferably, the step of predicting includes two or 

2S more sub- steps, 

30 

31 Preferably, the sub-steps of the predicting step are 

32 calculations. 



Further preferably, a reference region is one or 
more macroblocks in the same position iii the video 
picture but from one or more different reference 
time frames as selected by other encoding steps. 
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1 

2 Preferably, one of the calculationB is whether an 

3 estimate of the energy of some or all pixel values 

4 of the macroblock, optionally divided by the 

5 quantizer step size, is less than a predetermined 

6 threshold value- 
7 

8 Alternatively or further preferably, one of the 

9 calculations is whether an estimate of the values of 

10 certain discrete cosine transform coefficients for 

11 one or more sub-blocks of the macroblock, is less 

12 than a second threshold value. 
13 

14 . Further preferably^ the method of encoding pictures 

15 may be performed by a computer program embodied on a 
IS computer usable .medium. 

17 

18 Further preferably, the method of encoding pictures 

19 may be performed by electronic circuitry. 
20 

21 The estimate of the values of certain discrete 

22 cosine transform coefficients may involve: 

23 dividing the sub-blocks into four equal regions; 

24 calculating the sum of sdDSolute differences of the 

25 residual pixel values for each region of the sub- 

26 block, where the residual pixel value is the 

27 corresponding reference pixel luminance value 

28 subtracted from the current pixel luminance value; 

29 estimating the low frequency discrete cosine 

30 transform coefficients for each region of the sub- 

31 blocks, such that: 
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yoi-absiA+C-B-D) 
- absiA +B-C-D) 
T„ >«abs(A + D—3-C) 
■ where Yoir Yio and Yjx represent the estimations 
of three low frequency discrete cosine transform 
coefficients and A, b, C and d represent the sum of 
absolute differences of each of the regions of the 
sub-blocfc where A is the top left hand comer, B is 
the top right hand comer, c is the bottom left hand 
comer and D is the bottom right hand comer; and 

selecting the maximum value of the estimate of 
the discrete cosine transform coefficients from all 



11 the estimates calculated. 
12 
13 



The invention will novf be described, by way of 
example, with reference to the figures of the 

IS drawings in which: 
16 

17 Figure 1 shows a flow diagram of a video picture 

18 encoding process . 
19 

20 Figure 2 shows a flow diagram of a macroblock 

21 encoding process 
22 

23 Figure 3 shows a flow diagram of a prediction 

24 decision process 
25 

25 With reference to Figure. i, a first step 102 reads a 

27 picture frame in a video sequence and divides it 

28 into non-overlapping macroblocks (MBs) . Each MB 

29 comprises four luminance blocks and two chrominance 
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a 

blocks^ each block corapx-leing 8 pixels by 8 pixels - 
Step 104 encodes the MB as shown in Pigxire 2. 

With reference to Figure 2, a MB encoding process is 
shown 104, where a decision step 2 02 is performed 
before any other step- 

The current H263 encoding process currently teaches 
that each MB in the video ^ encoding process typically 
goes through the steps 204 to 226 or equivalent 
processes, in the order shovm in Figure 2 or in a 
different order. Motion estimatioxL step 204 
Identifies one or more prediction MB{s) each of 
which is defined by a MV indicating a distance 
offset from the current KB and a selection of a 
reference picture; Motion compensation step 206 
subtracts the prediction MB from the current MB to 
form a Prediction Error (PE) MB. If the value of MV 
requires to be encoded (step 208) , then MV is 
entropy encoded (step 210) optionally with reference 
to a predicted MV, 

Bach block of the PB MB is then forward discrete 
cosine transformed (FDCT) 212 which outputs a block 
of coefficients representing the spectral sub-bands 
of each of the PE blocks. The coefficients of the 
PDCT block are then quantized (for example through 
division by a quant^izer step size) 214 and then 
rounded to the nearest integer. This has the effect 
of reducing many of the coefficients to zero. If 
there are any non-zero quantized coefficients 
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(ecoeff ) 21S then the resulting block is entropy 
encoded by steps 218 to 222. 

in order to form a reconstructed picture for further 
predictions, the quantized coefficients (QCoeff) are 
re-scaled (for example by multiplication by a 
quantizer step size) 224 and transformed with an 
inverse discrete cosine transform (IDCT} 226. After 
the IDCT the reconstructed Pe MB is added to the 
reference MB and stored for further preiiction. 



The decision step 228 looks at the output of the 
prior processes and if the is equal to zero and 
all the Qaoeff 3 are zero then the encoded 
information is not written to the bitetream but a 
skip MB indication is written, instead. This means 
that all the processing time that has been used to 
encode the MB has not been necessary because the Mb 
is regarded as similar to or the same as the 
IS previous MB. 
20 

Decision step 2 02 predicts whether the current MB is 
likely to be skipped, that is that after the process 
steps 202 - 226, the MB is not coded but a skip 
indication is written instead. If the Dacision step 
202 does predict that the MB would be skipjped the MB 
is not passed on to the step 204 and the following 
process steps but skip information is passed 
28 directly to step 232. 



21 

22 

23 

24 

25 

26 

27 



29 
30 
31 



With reference to Pigi^re 3, a flow diagram is shown 
of the decision to skip the MB 202. 
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1 JMBs that are skipped have zero MV and QCoeff • Both 

2 of these conditions are likely to be met if there is 
J a strong similarity between the current MB and the 

4 same MB position in the reference frame. The energy 

5 of a residual MB formed by subtracting the reference 
G MB, without motion compensation, from the current MB 

7 is approximated by the sum of sibsolute differences 

8 for the luminance part of the MB with zero 

9 displacement CSADOmb) given by: 

10 aiDOj^=|; f,\Cc(i.J)-Cj,(,U)\ Jffsustioa 1 

11 C^ihj) and Cp(i,/) are luminance samples from an KIB 

12 in the ctirreht frame and in the same position in 

13 the reference frame respectively . 
14 

15 The relationship between SAD0^3B and the probability 

16 that the MB will be skipped also depends on the 

17 quantizer step size since a higher step size 

18 typically results in an increased proportion of 

19 skipped MBs, 

20 A comparison of the calculation SADOhb (optionally 

21 divided by the quantizer step size (Q) ) 302 to a 

22 first threshold value grives a first comparison step 

23 304 • If the calculated value is greater than a first 

24 threshold value then the MB is passed to step 204 

25 and enters a normal encoding process. If the 

26 calculated value is less than a first threshold 

27 value then a second calculation is performed 306. 
28 

29 Step 306 performs additional calculations on the 

30 residual MB. Bach 8x8 luminance block is divided 

31 into four 4x4 blocks- B, C and D (Equation 2) are 
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the SAD values of each 4x4 l>lock and R(i, j) are the 
residual pixel values without motion compensation. 



1. 7 
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Yoi, Yxo and {Equation 3) provide a low- complexity 
estiroate of the magnitudes of the three low 
frequency dct coefficients coeff(o,l), coeff(i,o) 
and coeffd,!). respectively. if any of these 
coefficients is large then there is a high 
probability that the MB should not be skipped. 
Y4x4biocfc (Equation 4) is therefore used to predict 
whether each block; may be skipped. The maximum for 
the luminance part of a macroblock is calculated 
17 using Equation 5. 
18 

21 Egaatlan 3 

22 ■ 

25 

27 EquaLtlon 5 
28 
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1 The calculated value of y4x4ittax ia compared with a 

2 second threshold 308, If the calculated value is 

3 less than a second threshold then the MB is skipped 

4 and the next step in the process is 232, If the 

5 calculated value is greater than a second threshold 

6 then the MB is passed to step 204 and the subsequent 

7 steps for encoding. 
6 

9 These steps typically have very little impact on 

10 computational complexity. SADOmb is normally computed 

XI in the first step of any motion estimation algorithm 

12 and so there is no extra calculation required. 

13 Furthermore, the SAD values of each 4x4 block (A, By 

14 C and D in Equation 2) may be calculated without 

15 penalty if SADOmb is calculated -by adding together 

16 the values of SAD for each 4x4- sample sub-block in 

17 the MB. 
18 

19 The additional computational requirements of the 

20 classification algorithm are the operations In 

21 Equations 3, 4 and 5 and these are typically not 

22 computationally intensive. 
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